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The purpose of this paper is to provision the on demand resources to the end 
users as per their need using prediction method in cloud computing 
environment. The provisioning of virtualized resources to cloud consumers 
according to their need is a crucial step in the deployment of applications on 
the cloud. However, the dynamical management of resources for variable 
workloads remains a challenging problem for cloud providers. This problem 
can be solved by using a prediction based adaptive resource provisioning 
mechanism, which can estimate the upcoming resource demands of 
applications. The present research introduces a prediction based resource 
provisioning model for the allocation of resources in advance. The proposed 
approach facilitates the release of unused resources in the pool with quality 
of service (QoS), which is defined based on prediction model to perform the 
allocation of resources in advance. In this work, the model is used to 


Workload prediction determine the future workload prediction for user requests on web servers, 
and its impact toward achieving efficient resource provisioning in terms of 
resource exploitation and QoS. The main contribution of this paper is to 
develop the prediction model for efficient and dynamic resource provisioning 
to meet the requirements of end users. 
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1. INTRODUCTION 

In the cloud computing environment, cloud users often have time-changing requirements for virtual 
resources. To provision resources for dynamic and uncertain changes in the workload and to react to these 
changes accordingly, the cloud provider should manage resources based on the requirement of the users. In 
the present study, a workload prediction model for dynamic and efficient resource provisioning is proposed 
to manage the number of user requests and the required resources. 

One problem with such a resource provisioning scheme is the occurrence of thrashing, in which, due 
to frequent variation of the workload (number of job requests), machines can be added and released to meet 
each requirement while satisfying the QoS metrics. Solving this problem requires an ability to predict the 
incoming workload on the system and to allocate resources a priori by using prediction methods for the 
required resources. The main contributions in this paper are; (a) The design of a prediction mechanism and of 
the flow of the prediction model for different periods; and (b) The use of prediction methods to determine the 
workload based on a historical database. 

The rest of the paper is organized as follows. Section II presents a summary of the related work in 
this domain. Section III provides the domain analysis for the workload pattern. Section IV introduces the 
prediction mechanism. Section V describes the case study used to validate the proposed approach and 
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number of job request requirement for resources. In section VI, we present the evaluation results for the 
resource provisioning and allocation. Finally, section VII presents the concluding remarks and future scope. 


2. RELATED WORK 

Cloud computing characterizes the delivery of computation as a service, in which resources, such as 
the CPU, software, hardware, information, and devices, are granted to users as services through the Internet. 
The characteristics of cloud computing, such as auto-scaling, the provisioning of services based on demand, 
and the utility mechanism, has been largely adopted by different analysts [1]. Cloud computing provides a 
scalable and flexible platform for providing high-performance computing requirements [2], [3], [4]. 
However, the pool of resource, flexible service provisioning, and elasticity are not the only potentials of the 
cloud. The cloud computing system provisions a homogeneous and heterogeneous operating system 
environment by using virtualization. 

In the IT industry, multiple cloud providers deliver QoS to end users according to their demand. 
However, an issue that emerges from the initial to the deployment stage is the reality that the pattern of 
access to the application by various end users varies frequently. As a result, there is an unpredictable number 
of users and variable workload during some periods. In a static and inefficient resource provisioning 
mechanism, during periods of low demand for resources, there will be an overload of obtainable resources, 
whereas during periods of high demand, the available resources will be inadequate. This issue leads to poor 
QoS and high costs. A current research challenge is to develop an adaptive method of resource provisioning 
in terms of cost and performance. 

Cloud computing can address the above challenge by provisioning resources dynamically and 
effectively to end-user applications based on the prediction of workload and resources according to the user 
request arrival rate and the service response rate. This means that additional resources can be provided during 
periods of high demand and released during periods of low demand, without loss of QoS to end users [5] The 
challenge with such a dynamic resource provisioning strategy is the determination of the proper quantity of 
resources to be set up and provisioned in a particular period to meet the expected QoS for a variable 
workload. An ideal solution would require the capability to predict the incoming workload and required 
resources in advance. 

The expected outcome is the determination of the number of virtual machines that should be created, 
configured, and provided to handle the variable workload. This challenge has been addressed through various 
ways, such reactive [6], proactive [6], and predictive [7] approaches. Effective resource provisioning is not 
an easy and uncomplicated task. To meet the above requirements, the following criteria should be considered 
in developing the algorithm: (a) the computation of the user request rate, (b) the minimization of the request 
rejection rate, (c) the average response time and the number of requests timed out, (d) the percentage of timed 
out requests (PTOR) for a number of users, and (e) the accurate computation in advance of the resources 
required for a variable workload. The current research presents an efficient and effective resource 
provisioning algorithm that uses a prediction technique to provision and remove resources dynamically. 
A strategy to improve the resource utilization is proposed and compared with the conventional approach. 


3. DOMAIN ANALYSIS 

The cloud computing format is mainly described by using three service models: (a) Infrastructure as 
a Service (IaaS), which explains about the resources offering procedures by cloud providers; (b) Platform as 
a Service (PaaS), which describes how cloud providers provide an entire cloud environment implemented 
and deployed in a certain programming language for a specific type of applications; and (c) Software as a 
Service (SaaS), which refers to applications that can be offered to customers according to their need. The 
deployed cloud models are mainly classified into four forms: (a) public cloud, which is available for 
everyone; (b) private cloud, which is hosted solely for one industry; (c) community cloud, which is a cloud 
environment made accessible only to a certain group of industry or individuals in collaboration; and (d) 
hybrid cloud, which refers to various clouds that are interconnected with the hosted applications deployed in 
the cloud environment [8]. In the present work, the term “workload” refers to the number of arrived requests 
to access the resource of the cloud system. 

The application or jobs are need to be switched automatically which are accessed by users. In the 
cloud computing scenario, the workload category, cloud service models, and deployment models are 
interconnected with each other, as shown in Figure. 1. The application workload pattern describes diverse 
user behaviors, which result in the utilization of IT resources in variable forms. The workload can be 
determined based on the number of user requests, the load calculation on the servers, the network traffic, and 
the data storage [9]. In the cloud computing environment, the resource provisioning depends on the incoming 
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workload [10]. For the prediction of the workload and resources in this study, the workload patterns are 
categorized based on the incoming workload, which includes the number of user requests, the amount of 
work over a specified period, the user request arrival rate, and the time between two consecutive requests 
(inter-arrival time). According to the above characteristics, the workload is categorized into the five types: 
static, periodic, unpredictable, continuously changing, and once in a lifetime [11]. 


3.1. Static Workload 

The workload for resources with corresponding or equal usage of a hosted application is called static 
workload. In this workload type, there is no requirement to increase or decrease the processing power, 
network bandwidth, data storage, or memory because the workload is constant. A static workload does not 
require the functionality of the cloud, such as elasticity. Private Websites and wikis are examples of sites 
with a static workload. 


3.2. Periodic Workload 

Periodic jobs are very common in our daily routine. Yearly income tax pay, monthly utility bills, 
traffic during rush hours etc are the examples of periodic tasks or jobs. It is observed that in the same 
interval, many people using these tasks. During such periodic tasks, it is difficult to provide enough resources 
for the peak load and to handle the unused resources for the non-peak load. This problem leads to over- or 
under-provisioning of resources to the hosted application. Periodic tasks occur at the same interval of the day, 
month, or year; however, they consist of a higher number of requests over a peak period. A periodic 
workload often requires elasticity and scalability to handle the number of requests during the peak intervals. 
The computimg workload pattern is depicted on Figure 1. 


ontinuosiy Once ina 


Static changing lifetime 
Workload Workload Unprdictable Workload Periodic workload 
= i 


Category 


Cloud Service 
Models 


Cloud 
Deployment 
Types 


Figure 1. The Cloud Computing Workload Pattern 


3.3. Unpredictable Workload 

Similarly to a periodic workload, an unpredictable workload consists of increasing and decreasing 
incoming workloads from the users. This workload is described as unpredictable because the variations occur 
randomly. As a result, cloud providers have to deal with the unstructured and unplanned provisioning of 
resources to meet the changing requirements. Accurate prediction is the main challenge to obtaining the 
scaling requirements for resources for an unpredicted workload. To achieve accurate results, constant 
observation of the workload is required for example, unpredictable traffic, forecasting etc. 
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3.4. Continuously Changing Workload 

The resource utilization may be maximized or minimized over time, and the workload may change 
continuously. For many applications, there may be long-term changes in the workload. A continuously 
changing workload can be considered as a constant escalation or declination of the resource utilization. The 
elasticity of the cloud enables the provisioning or decommissioning of resources for a continuously 
changing workload. 


3.5. Once-in-a-lifetime Workload 

The peak workload utilization may occur only once in a lifetime when the resources are distributed 
equally. In such a special case, the resource provisioning and decommissioning can often be a manual or 
dynamic task at a known point in time. 


4. PREDICTION MECHANISM 

To predict the incoming workload and to identify the workload type, the user requests have to be 
analyzed. An analysis of the historical database is also needed to meet the resource requirements a priori. 
Based on the user requests and their need analysis are considered for hosted application. Figure 2 shows the 
procedure of prediction at time t-1 and interval t. The previous period for interval (t-1) and the live period for 
interval t are observed for a number of active users. The resource requirements for the observed period are 
monitored, collected, and entered into the historical database after the procedure. Based on the database, the 
workload for time intervals t-1 and t are calculated, and prediction methods are applied to determine the 
workload for the next period (t+1). 
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Figure 2. The Flow of the Prediction Model 
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As shown in the above Figure 2, prediction of workload for the resource demands is described in different 
interval. For time interval t — 1, the mean CPU, RAM workload, and number of active users are calculated as 
follows: 


At-1(cpu) = CPU Workload Mean at the interval t — 1 

Bt-1(RAM) = RAM Workload Mean at the interval t — 1 

Yt-1(active user) = Mean number of active users at time t — 1 

At_, =For whole scenario at time t-1 (1) 


For time interval t, the mean CPU, RAM workload, and number of active users are calculated as follows: 


Ot (cpu) = Mean of CPU Workload at time t 
Beam) = Mean of RAM Workload at time t 
Yt (active user) = Mean of number of active users at time t 
w =For whole scenario at time t (2) 


To consider the workload at the initial level, a pseudo-code is described in Algorithm 1. 


Algorithm 1: Initial Level 
Input: Number of requests 


Requirement: Calculation of the inter arrival time for the periods t-1 and t period 
Begin. 
Monitor active user requests 
While (time interval is t-1). 
Calculate the inter arrival time of active users. 
For (each inter arrival time in period t-1). 
Monitor workload 
Calculate mean value Af_, for week1. 
For (each inter arrival time of t period). 
Monitor the workload. 
Calculate the mean value A?’ for week2. 
End for 
End for 
End. 


Now for each week of the month, calculate the mean value of the workload and predict for next week based 
on current and the historical database. Thus, the predicted value for time t+ n is: 


Ayn = AM HAV EAM Bee ty (3) 
a A 
A E alee (4) 


(3) and (4) express the summation of the previous workload to predict the resource demands for the next 
period. The pseudo-code for the workload prediction is presented in Algorithm 2. 


Algorithm 2: Workload Prediction 
Input: Number of requests 


Requirement: Calculation of the mean CPU, RAM, and number of active users at time interval t-1 and 
period t 
Begin. 
Monitor the number of active user requests at time t-1 and period t. 
Calculate A} ; and A” // by using (1) and (2). 
For each week, calculate the mean workload. 


Predict the workload for the next week t+ n, ya i, 
End for. 
End. 
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To allocate resources to the job, in this section number of job identification are considered for various regions 
as shown in the Figure. 3. 
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Figure 3. The Flow of the Resource Allocation 
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No of Requests Vs. Regions 


Figure 4. Number of Requests by Region 


Figure 4 shows the number of job requests by region (R). Different prediction models are applied for 
the resource requirements in various periods, including the first and second moving average (FSMA), 
weighted moving average (WMA), and exponential moving average (EMA) models [11]. The historical data 
and prediction model output are used as input data, which are trained to improve the accuracy of the resource 
provisioning strategy. The prediction model uses the output of the predicted value to provision the resources. 
At the same time, provisioned resources results will be stored in the historical database repository for future 
prediction. The prediction is done by applying various moving average methods. In particular, the moving 
average methods FSMA and WMA are recommended to consider non-fluctuations in the short-term demand, 
as shown in our previous work [12]. An ontology-based dynamic resource provisioning for public cloud was 
implemented in the previous work [13]. 


First and Second Moving Average (FSMA) method: 
In the first moving average method, the i" user request for resource at time interval t is measured, 
and N is considered as a moving average period for time interval t [9], as described in (6): 


Fai = Ot O+¥e-1 on Yt-n (i) (6) 


Here, y is considered as the original value for each period, and t is regarded as the current period. In (6), Fe 
denote the first moving average value for the i user at time t+1. The resource requirement of the i® user at 
time t+ô can be predicted as: 


Yt+6 i) = Ar (i) + 6B, Ci) (7) 


where 6 is the predicted time sequence number, and y;,,5 denotes the predicted value at time y,;5, as shown 
in (7). Then, to predict the y,,5 value, the second moving average value can be measured by applying (8): 


_ Fear O+Fe O+--4+ Fe-w-1) @ 


sa = — r (8) 


N 
In (7), A,(i) and B, (i) are calculated by using (9) and (10): 


A, (i) = 2Fia1 © — Sta O (9) 


B,(i) = 2Ft+1 ()-St41 (Ù) (10) 


N-1 
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Based on the above equation, the total number of resource requirements for n users is measured as: 
Vt+65 = (11) 


The total number of resources requested by all the n users at time t + 6 can be calculated, and the predicted 
value at time periods t and t + 6 for for all users can be determined by using equation. (11). Further, the 
resources are allocated to the users according to their actual need and based on the predicted value, and the 
results are compared. 


Weighted Moving Average Method (WMA): 

Whereas the first and second moving average model allocates the same weight to each element of 
the moving average database, the weighted moving average model allows any weight to be placed on each 
element, and the sum of all weights equals 1 [11]. Here, the active user requests and the utilization of 
resources for period F, are described in (12): 


Fy = Wi Ati + Wiga At—cti) + e Wn At-n (12) 


Where F; = the forecast for the coming period, 
n=the total number of periods in the forecast, 
A; = the actual occurrence for the period t-i, 


Here, XW; =1 


A = ¥ (Weight for period n)—(demand in period n)) 
X Weights 


WM 


(13) 


Exponential Moving Average Method(EMA): 

The exponential moving average (EMA) method is mainly logical and is the easiest approach for 
predicting fluctuations in the short-term demand. This method is effective for both short-term and time-series 
prediction. As each new piece of data is added, the oldest observation is dropped, and a new forecast is 
calculated [14]. The predicted values are calculated by using the smoothing constant a. The EMA method is 
expressed as: 


F; = Fi-1 + a (Apa — Fe-1) 
F = aA +(1- a) Fy (14) 


ll 


where F, is the exponentially smoothed forecast for previous period t 
F,_, is the exponentially smoothed forecast made for the prior period, 
A;-1 = The actual demand in the prior period, 
a = The desired response rate or smoothing constant, 0< a < 1 

This method gives to a higher weight to the later measured value and a lower weight to the earlier 
measured value. The EMA method is able to respond quickly to fluctuations in the short-term demand. If 
request from active user is greater than the defined period then it will be considered in the long-term period 
and if it is less than then it will be considered as a short term period. Long-term and short-term periods are 
defined as L;and S,, respectively. 


5. NUMBER OF JOB REQUEST REQUIREMENT FOR RESOURCES 

Here, the numbers of requests from various regions are calculated, as shown in (1) and (2). The 
number of arrived user requests in the deployed cloud for education is evaluated, as shown in our previous 
work, and the availability of resources in the specified regions is determined. The available resources are 
generated and monitored by using the commercial Amazon EC2 cloud platform [15, 16, 17]. The capability 
of the available VMs is calculated; then, the numbers of job requests are allocated to the resources. Once the 
numbers of jobs are allocated to the VMs, the current load is calculated. If the VM becomes over- or under- 
loaded, resources are automatically added or released as necessary. The whole procedure is described below. 


Let VM={VMi,VM2,VM3,...,VMn} be a set of N virtual machines and 


Task (number of job requests) = {task1, task2, task3, ..., K} of K task to be regular and processed in VM. 
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Here, the fitness of the number of requests (i) may be determined by the capacity of VM,j, which is 
calculated by using (15): 


> job typelength, 


J= 5 
7  Evaluatecapacityof VM, (capacityj) 


(15) 


job type length shows the how many tasks are submitted on VM, with the calculation of capacity of specific 
VM to handle submitted jobs. The capacity of the VM is measured by applying (16): 


PR at IPS.+BW VM. 
Capacity = Num proc, *num MIPS, B M; 
(16) 


where Num_proc is the number of processors in the VM, Num_MIPS indicates the number of MIPS, and 
BW_VM represents bandwidth of the VM. 


n 
X, job typelength, ; + input file length 
fi, == 


EvaluateCapacityof VM; (capacity j ) 


(17) 


In (17), the input file length is also considered to determine the length of the job before execution. Tasks are 
assigned to the virtual machine by using the most effective fitness value from the (17). 

Load Calculation: When tasks are submitted to the beneath loaded VM, the present work of all offered VM 
will be measured by victimization of knowledge which is received from the database [18]. To calculate the 
deviations in load variance on VMs, the following standard deviation (SD) (18) is used: 


(18) 
where Xj; is the processing time of the VM, as described in (19): 
k 
>. job type length; 
y.=H 
X; — 
capacity; (19) 
Then, the mean processing times for all VMs are calculated by applying (20): 
2 x; 
x= 2 
n (20) 


If the SD of the loaded VM is smaller than or equal to the mean, then the system is in a balanced 
state. On the other hand, if the SD is higher than the mean, then the system is in imbalance, and the auto- 
scaling mechanism for resource provisioning will be applied. Here, one VM is considered to be capable of 
serving 50 requests at a time. Thus, when the number of requests >= 50, then scale up new instance and 
migrate requests on new instance. It is refreshing arrival of request for every 1 minute. Based on the 
predicted value, the resources are scaled up or down according to the predicted demand. The pseudo-code for 
scaling up and down is shown in Algorithm 3. 
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Algorithm 3: Scaling up and down. 
Begin. 
Monitor AT, and àẸ for the previous and the current time, respectively. 
Predict workload Af, , at time t + n. 
Let R= ri u ri+1 u ri+2 u rit3..... u rm before scaling, where R is the set of virtual resources. 
Let Af4n = peas AN =x, 
Let At @-1 = Y- 


If > y, then // the workload increases. 
Add r into the R (scale up the virtual resource); 


else if < Y , then // the workload decreases. 
Remove extra r from R (scale down the virtual resource); 
else. 
Refresh the resources. 
End. 


6. RESULTS OF THE QoS METRICS FOR RESOURCE ALLOCATION 

Figure 6 shows the efficient resource provisioning with the use of the prediction model achieves 
maximum CPU utilization compared with the existing (conventional) cloud. Here, the average CPU 
utilization is obtained for 50 requests on the conventional cloud and 100 requests on the efficient cloud. The 
results indicate that the efficient cloud uses a minimum number of instances but provides maximum 
utilization. Figure 7 shows that an efficient response time is obtained for 50 requests on the conventional 
cloud and 100 requests on the efficient cloud. Figure 8 indicates that efficient resource provisioning with the 
use of the prediction model achieves the maximum throughput compared with the existing (conventional) 
cloud. Here, the average throughput is measured for 50 requests on the conventional cloud and 100 requests 
on the efficient cloud. The results show that the efficient cloud uses a minimum number of instances but 
provides the maximum throughput. 
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Figure 6. Average CPU utilization for 50 requests on the conventional and 100 requests on the efficient cloud 
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Figure 8. Average throughputs for 50 requests on the conventional and 100 requests on the efficient cloud 


7. CONCLUSIONS AND FUTURE WORK 

In this research, the design of a prediction model for efficient resource provisioning is presented. A 
prediction model is an essential part of the cloud environment, in which it is used to predict the number of 
user requests and the corresponding resource requirements. In the present work, the FSMA, WMA, and EMA 
models are used to predict the number of user requests for cloud resources. The cloud environment is set up 
on the EC2 AWS cloud. In addition, the auto-scaling algorithm is used in resource provisioning based on the 
prediction model. The experimental results of the QoS parameters are presented and compared between the 
conventional cloud and the efficient cloud before and after prediction. Based on the findings, the proposed 
prediction model has higher efficiency in resource allocation. 

In future works, we will apply the ARIMA model to improve the prediction accuracy for resource 
provisioning. Further, we plan to integrate the architecture for adaptive resource provisioning by using the 
workload prediction strategy. 
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